10 research outputs found

    Online Analysis of Dynamic Streaming Data

    Get PDF
    Die Arbeit zum Thema "Online Analysis of Dynamic Streaming Data" beschäftigt sich mit der Distanzmessung dynamischer, semistrukturierter Daten in kontinuierlichen Datenströmen um Analysen auf diesen Datenstrukturen bereits zur Laufzeit zu ermöglichen. Hierzu wird eine Formalisierung zur Distanzberechnung für statische und dynamische Bäume eingeführt und durch eine explizite Betrachtung der Dynamik von Attributen einzelner Knoten der Bäume ergänzt. Die Echtzeitanalyse basierend auf der Distanzmessung wird durch ein dichte-basiertes Clustering ergänzt, um eine Anwendung des Clustering, einer Klassifikation, aber auch einer Anomalieerkennung zu demonstrieren. Die Ergebnisse dieser Arbeit basieren auf einer theoretischen Analyse der eingeführten Formalisierung von Distanzmessungen für dynamische Bäume. Diese Analysen werden unterlegt mit empirischen Messungen auf Basis von Monitoring-Daten von Batchjobs aus dem Batchsystem des GridKa Daten- und Rechenzentrums. Die Evaluation der vorgeschlagenen Formalisierung sowie der darauf aufbauenden Echtzeitanalysemethoden zeigen die Effizienz und Skalierbarkeit des Verfahrens. Zudem wird gezeigt, dass die Betrachtung von Attributen und Attribut-Statistiken von besonderer Bedeutung für die Qualität der Ergebnisse von Analysen dynamischer, semistrukturierter Daten ist. Außerdem zeigt die Evaluation, dass die Qualität der Ergebnisse durch eine unabhängige Kombination mehrerer Distanzen weiter verbessert werden kann. Insbesondere wird durch die Ergebnisse dieser Arbeit die Analyse sich über die Zeit verändernder Daten ermöglicht

    Advancing throughput of HEP analysis work-flows using caching concepts

    Get PDF
    High throughput and short turnaround cycles are core requirements for efficient processing of data-intense end-user analyses in High Energy Physics (HEP). Together with the tremendously increasing amount of data to be processed, this leads to enormous challenges for HEP storage systems, networks and the data distribution to computing resources for end-user analyses. Bringing data close to the computing resource is a very promising approach to solve throughput limitations and improve the overall performance. However, achieving data locality by placing multiple conventional caches inside a distributed computing infrastructure leads to redundant data placement and inefficient usage of the limited cache volume. The solution is a coordinated placement of critical data on computing resources, which enables matching each process of an analysis work-flow to its most suitable worker node in terms of data locality and, thus, reduces the overall processing time. This coordinated distributed caching concept was realized at KIT by developing the coordination service NaviX that connects an XRootD cache proxy infrastructure with an HTCondor batch system. We give an overview about the coordinated distributed caching concept and experiences collected on prototype system based on NaviX

    Extending the distributed computing infrastructure of the CMS experiment with HPC resources

    Get PDF
    Particle accelerators are an important tool to study the fundamental properties of elementary particles. Currently the highest energy accelerator is the LHC at CERN, in Geneva, Switzerland. Each of its four major detectors, such as the CMS detector, produces dozens of Petabytes of data per year to be analyzed by a large international collaboration. The processing is carried out on the Worldwide LHC Computing Grid, that spans over more than 170 compute centers around the world and is used by a number of particle physics experiments. Recently the LHC experiments were encouraged to make increasing use of HPC resources. While Grid resources are homogeneous with respect to the used Grid middleware, HPC installations can be very different in their setup. In order to integrate HPC resources into the highly automatized processing setups of the CMS experiment a number of challenges need to be addressed. For processing, access to primary data and metadata as well as access to the software is required. At Grid sites all this is achieved via a number of services that are provided by each center. However at HPC sites many of these capabilities cannot be easily provided and have to be enabled in the user space or enabled by other means. At HPC centers there are often restrictions regarding network access to remote services, which is again a severe limitation. The paper discusses a number of solutions and recent experiences by the CMS experiment to include HPC resources in processing campaigns

    Plastic ingestion by harbour porpoises Phocoena phocoena in the Netherlands: Establishing a standardised method

    No full text
    Stomach contents of harbour porpoises (Phocoena phocoena) collected in the Netherlands between 2003 and 2013 were inspected for the presence of plastic and other man-made litter. In 654 stomach samples the frequency of occurrence of plastic litter was 7% with less than 0.5% additional presence of nonsyntheticman-made litter. However, we show that when a dedicated standard protocol for the detection of litter is followed, a considerably higher percentage (15% of 81harbour porpoise stomachs from the period 2010–2013) contained plastic litter. Results thus strongly depended on methods used and time period considered. Occurrence of litter in the stomach was correlated to the presence of othernon-food remains like stones, shells, bog-wood, etc., suggesting that litter was often ingested accidentally when the animals foraged close to the bottom. Mostitems were small and were not considered to have had a major health impact. No evident differences in ingestion were found between sexes or age groups, with the exception that neonates contained no litter. Polyethylene and polypropylene were the most common plastic types encountered. Compared to earlier literature on the harbour porpoise and related species, our results suggest higherlevels of ingestion of litter. This is largely due to the lack of dedicated protocols to investigate marine litter ingestion in previous studies. Still, the low frequency of ingestion, and minor number and mass of litter items found in harbourporpoises in the relatively polluted southern North Sea indicates that the species is not a strong candidate for annual monitoring of marine litter trends under the EU marine strategy framework directive. However, for longertermcomparisons and regional differences, with proper dedicated protocols applied, the harbour porpoise has specific use in quantifying litter presence in the, for that specific objective, poorly studied benthic marine habitat

    Advancing throughput of HEP analysis work-flows using caching concepts

    Get PDF
    High throughput and short turnaround cycles are core requirements for efficient processing of data-intense end-user analyses in High Energy Physics (HEP). Together with the tremendously increasing amount of data to be processed, this leads to enormous challenges for HEP storage systems, networks and the data distribution to computing resources for end-user analyses. Bringing data close to the computing resource is a very promising approach to solve throughput limitations and improve the overall performance. However, achieving data locality by placing multiple conventional caches inside a distributed computing infrastructure leads to redundant data placement and inefficient usage of the limited cache volume. The solution is a coordinated placement of critical data on computing resources, which enables matching each process of an analysis work-flow to its most suitable worker node in terms of data locality and, thus, reduces the overall processing time. This coordinated distributed caching concept was realized at KIT by developing the coordination service NaviX that connects an XRootD cache proxy infrastructure with an HTCondor batch system. We give an overview about the coordinated distributed caching concept and experiences collected on prototype system based on NaviX

    Advancing throughput of HEP analysis work-flows using caching concepts

    No full text
    High throughput and short turnaround cycles are core requirements for efficient processing of data-intense end-user analyses in High Energy Physics (HEP). Together with the tremendously increasing amount of data to be processed, this leads to enormous challenges for HEP storage systems, networks and the data distribution to computing resources for end-user analyses. Bringing data close to the computing resource is a very promising approach to solve throughput limitations and improve the overall performance. However, achieving data locality by placing multiple conventional caches inside a distributed computing infrastructure leads to redundant data placement and inefficient usage of the limited cache volume. The solution is a coordinated placement of critical data on computing resources, which enables matching each process of an analysis work-flow to its most suitable worker node in terms of data locality and, thus, reduces the overall processing time. This coordinated distributed caching concept was realized at KIT by developing the coordination service NaviX that connects an XRootD cache proxy infrastructure with an HTCondor batch system. We give an overview about the coordinated distributed caching concept and experiences collected on prototype system based on NaviX

    CLOVER-DBS: Algorithm-Guided Deep Brain Stimulation-Programming Based on External Sensor Feedback Evaluated in a Prospective, Randomized, Crossover, Double-Blind, Two-Center Study

    Get PDF
    BACKGROUND Recent technological advances in deep brain stimulation (DBS) (e.g., directional leads, multiple independent current sources) lead to increasing DBS-optimization burden. Techniques to streamline and facilitate programming could leverage these innovations. OBJECTIVE We evaluated clinical effectiveness of algorithm-guided DBS-programming based on wearable-sensor-feedback compared to standard-of-care DBS-settings in a prospective, randomized, crossover, double-blind study in two German DBS centers. METHODS For 23 Parkinson's disease patients with clinically effective DBS, new algorithm-guided DBS-settings were determined and compared to previously established standard-of-care DBS-settings using UPDRS-III and motion-sensor-assessment. Clinical and imaging data with lead-localizations were analyzed to evaluate characteristics of algorithm-derived programming compared to standard-of-care. Six different versions of the algorithm were evaluated during the study and 10 subjects programmed with uniform algorithm-version were analyzed as a subgroup. RESULTS Algorithm-guided and standard-of-care DBS-settings effectively reduced motor symptoms compared to off-stimulation-state. UPDRS-III scores were reduced significantly more with standard-of-care settings as compared to algorithm-guided programming with heterogenous algorithm versions in the entire cohort. A subgroup with the latest algorithm version showed no significant differences in UPDRS-III achieved by the two programming-methods. Comparing active contacts in standard-of-care and algorithm-guided DBS-settings, contacts in the latter had larger location variability and were farther away from a literature-based optimal stimulation target. CONCLUSION Algorithm-guided programming may be a reasonable approach to replace monopolar review, enable less trained health-professionals to achieve satisfactory DBS-programming results, or potentially reduce time needed for programming. Larger studies and further improvements of algorithm-guided programming are needed to confirm these results
    corecore